163 research outputs found
Sharp Computational-Statistical Phase Transitions via Oracle Computational Model
We study the fundamental tradeoffs between computational tractability and
statistical accuracy for a general family of hypothesis testing problems with
combinatorial structures. Based upon an oracle model of computation, which
captures the interactions between algorithms and data, we establish a general
lower bound that explicitly connects the minimum testing risk under
computational budget constraints with the intrinsic probabilistic and
combinatorial structures of statistical problems. This lower bound mirrors the
classical statistical lower bound by Le Cam (1986) and allows us to quantify
the optimal statistical performance achievable given limited computational
budgets in a systematic fashion. Under this unified framework, we sharply
characterize the statistical-computational phase transition for two testing
problems, namely, normal mean detection and sparse principal component
detection. For normal mean detection, we consider two combinatorial structures,
namely, sparse set and perfect matching. For these problems we identify
significant gaps between the optimal statistical accuracy that is achievable
under computational tractability constraints and the classical statistical
lower bounds. Compared with existing works on computational lower bounds for
statistical problems, which consider general polynomial-time algorithms on
Turing machines, and rely on computational hardness hypotheses on problems like
planted clique detection, we focus on the oracle computational model, which
covers a broad range of popular algorithms, and do not rely on unproven
hypotheses. Moreover, our result provides an intuitive and concrete
interpretation for the intrinsic computational intractability of
high-dimensional statistical problems. One byproduct of our result is a lower
bound for a strict generalization of the matrix permanent problem, which is of
independent interest.Comment: 57 pages, 4 figure
Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time
Sparse principal component analysis (PCA) involves nonconvex optimization for
which the global solution is hard to obtain. To address this issue, one popular
approach is convex relaxation. However, such an approach may produce suboptimal
estimators due to the relaxation effect. To optimally estimate sparse principal
subspaces, we propose a two-stage computational framework named "tighten after
relax": Within the 'relax' stage, we approximately solve a convex relaxation of
sparse PCA with early stopping to obtain a desired initial estimator; For the
'tighten' stage, we propose a novel algorithm called sparse orthogonal
iteration pursuit (SOAP), which iteratively refines the initial estimator by
directly solving the underlying nonconvex problem. A key concept of this
two-stage framework is the basin of attraction. It represents a local region
within which the `tighten' stage has desired computational and statistical
guarantees. We prove that, the initial estimator obtained from the 'relax'
stage falls into such a region, and hence SOAP geometrically converges to a
principal subspace estimator which is minimax-optimal within a certain model
class. Unlike most existing sparse PCA estimators, our approach applies to the
non-spiked covariance models, and adapts to non-Gaussianity as well as
dependent data settings. Moreover, through analyzing the computational
complexity of the two stages, we illustrate an interesting phenomenon that
larger sample size can reduce the total iteration complexity. Our framework
motivates a general paradigm for solving many complex statistical problems
which involve nonconvex optimization with provable guarantees.Comment: 64 pages, 8 figure
Sparse Principal Component Analysis for High Dimensional Vector Autoregressive Models
We study sparse principal component analysis for high dimensional vector
autoregressive time series under a doubly asymptotic framework, which allows
the dimension to scale with the series length . We treat the transition
matrix of time series as a nuisance parameter and directly apply sparse
principal component analysis on multivariate time series as if the data are
independent. We provide explicit non-asymptotic rates of convergence for
leading eigenvector estimation and extend this result to principal subspace
estimation. Our analysis illustrates that the spectral norm of the transition
matrix plays an essential role in determining the final rates. We also
characterize sufficient conditions under which sparse principal component
analysis attains the optimal parametric rate. Our theoretical results are
backed up by thorough numerical studies.Comment: 28 page
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems
We provide theoretical analysis of the statistical and computational
properties of penalized -estimators that can be formulated as the solution
to a possibly nonconvex optimization problem. Many important estimators fall in
this category, including least squares regression with nonconvex
regularization, generalized linear models with nonconvex regularization and
sparse elliptical random design regression. For these problems, it is
intractable to calculate the global solution due to the nonconvex formulation.
In this paper, we propose an approximate regularization path-following method
for solving a variety of learning problems with nonconvex objective functions.
Under a unified analytic framework, we simultaneously provide explicit
statistical and computational rates of convergence for any local solution
attained by the algorithm. Computationally, our algorithm attains a global
geometric rate of convergence for calculating the full regularization path,
which is optimal among all first-order algorithms. Unlike most existing methods
that only attain geometric rates of convergence for one single regularization
parameter, our algorithm calculates the full regularization path with the same
iteration complexity. In particular, we provide a refined iteration complexity
bound to sharply characterize the performance of each stage along the
regularization path. Statistically, we provide sharp sample complexity analysis
for all the approximate local solutions along the regularization path. In
particular, our analysis improves upon existing results by providing a more
refined sample complexity bound as well as an exact support recovery result for
the final estimator. These results show that the final estimator attains an
oracle statistical property due to the usage of nonconvex penalty.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1238 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Statistical Limits of Convex Relaxations
Many high dimensional sparse learning problems are formulated as nonconvex
optimization. A popular approach to solve these nonconvex optimization problems
is through convex relaxations such as linear and semidefinite programming. In
this paper, we study the statistical limits of convex relaxations.
Particularly, we consider two problems: Mean estimation for sparse principal
submatrix and edge probability estimation for stochastic block model. We
exploit the sum-of-squares relaxation hierarchy to sharply characterize the
limits of a broad class of convex relaxations. Our result shows statistical
optimality needs to be compromised for achieving computational tractability
using convex relaxations. Compared with existing results on computational lower
bounds for statistical problems, which consider general polynomial-time
algorithms and rely on computational hardness hypotheses on problems like
planted clique detection, our theory focuses on a broad class of convex
relaxations and does not rely on unproven hypotheses.Comment: 22 page
Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
Solving statistical learning problems often involves nonconvex optimization.
Despite the empirical success of nonconvex statistical optimization methods,
their global dynamics, especially convergence to the desirable local minima,
remain less well understood in theory. In this paper, we propose a new analytic
paradigm based on diffusion processes to characterize the global dynamics of
nonconvex statistical optimization. As a concrete example, we study stochastic
gradient descent (SGD) for the tensor decomposition formulation of independent
component analysis. In particular, we cast different phases of SGD into
diffusion processes, i.e., solutions to stochastic differential equations.
Initialized from an unstable equilibrium, the global dynamics of SGD transit
over three consecutive phases: (i) an unstable Ornstein-Uhlenbeck process
slowly departing from the initialization, (ii) the solution to an ordinary
differential equation, which quickly evolves towards the desirable local
minimum, and (iii) a stable Ornstein-Uhlenbeck process oscillating around the
desirable local minimum. Our proof techniques are based upon Stroock and
Varadhan's weak convergence of Markov chains to diffusion processes, which are
of independent interest.Comment: Appeared in NIPS 201
High-dimensional Varying Index Coefficient Models via Stein's Identity
We study the parameter estimation problem for a varying index coefficient
model in high dimensions. Unlike the most existing works that iteratively
estimate the parameters and link functions, based on the generalized Stein's
identity, we propose computationally efficient estimators for the
high-dimensional parameters without estimating the link functions. We consider
two different setups where we either estimate each sparse parameter vector
individually or estimate the parameters simultaneously as a sparse or low-rank
matrix. For all these cases, our estimators are shown to achieve optimal
statistical rates of convergence (up to logarithmic terms in the low-rank
setting). Moreover, throughout our analysis, we only require the covariate to
satisfy certain moment conditions, which is significantly weaker than the
Gaussian or elliptically symmetric assumptions that are commonly made in the
existing literature. Finally, we conduct extensive numerical experiments to
corroborate the theoretical results.Comment: 44 page
Optimal linear estimation under unknown nonlinear transform
Linear regression studies the problem of estimating a model parameter
, from observations
from linear model . We consider a significant
generalization in which the relationship between and is noisy, quantized to a single bit, potentially nonlinear,
noninvertible, as well as unknown. This model is known as the single-index
model in statistics, and, among other things, it represents a significant
generalization of one-bit compressed sensing. We propose a novel spectral-based
estimation procedure and show that we can recover in settings (i.e.,
classes of link function ) where previous algorithms fail. In general, our
algorithm requires only very mild restrictions on the (unknown) functional
relationship between and . We also
consider the high dimensional setting where is sparse ,and introduce
a two-stage nonconvex framework that addresses estimation challenges in high
dimensional regimes where . For a broad class of link functions
between and , we establish minimax
lower bounds that demonstrate the optimality of our estimators in both the
classical and high dimensional regimes.Comment: 25 pages, 3 figure
Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima
Temporal-difference learning (TD), coupled with neural networks, is among the
most fundamental building blocks of deep reinforcement learning. However, due
to the nonlinearity in value function approximation, such a coupling leads to
nonconvexity and even divergence in optimization. As a result, the global
convergence of neural TD remains unclear. In this paper, we prove for the first
time that neural TD converges at a sublinear rate to the global optimum of the
mean-squared projected Bellman error for policy evaluation. In particular, we
show how such global convergence is enabled by the overparametrization of
neural networks, which also plays a vital role in the empirical success of
neural TD. Beyond policy evaluation, we establish the global convergence of
neural (soft) Q-learning, which is further connected to that of policy gradient
algorithms
Sparse Generalized Eigenvalue Problem: Optimal Statistical Rates via Truncated Rayleigh Flow
Sparse generalized eigenvalue problem (GEP) plays a pivotal role in a large
family of high-dimensional statistical models, including sparse Fisher's
discriminant analysis, canonical correlation analysis, and sufficient dimension
reduction. Sparse GEP involves solving a non-convex optimization problem. Most
existing methods and theory in the context of specific statistical models that
are special cases of the sparse GEP require restrictive structural assumptions
on the input matrices. In this paper, we propose a two-stage computational
framework to solve the sparse GEP. At the first stage, we solve a convex
relaxation of the sparse GEP. Taking the solution as an initial value, we then
exploit a nonconvex optimization perspective and propose the truncated Rayleigh
flow method (Rifle) to estimate the leading generalized eigenvector. We show
that Rifle converges linearly to a solution with the optimal statistical rate
of convergence for many statistical models. Theoretically, our method
significantly improves upon the existing literature by eliminating structural
assumptions on the input matrices for both stages. To achieve this, our
analysis involves two key ingredients: (i) a new analysis of the gradient based
method on nonconvex objective functions, and (ii) a fine-grained
characterization of the evolution of sparsity patterns along the solution path.
Thorough numerical studies are provided to validate the theoretical results.Comment: To appear in JRSS
- β¦